Goto

Collaborating Authors

 contact point


MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos

Gavryushin, Alexey, Wang, Xi, Malate, Robert J. S., Yang, Chenyu, Liconti, Davide, Zurbrügg, René, Katzschmann, Robert K., Pollefeys, Marc

arXiv.org Artificial Intelligence

Large-scale egocentric video datasets capture diverse human activities across a wide range of scenarios, offering rich and detailed insights into how humans interact with objects, especially those that require fine-grained dexterous control. Such complex, dexterous skills with precise controls are crucial for many robotic manipulation tasks, yet are often insufficiently addressed by traditional data-driven approaches to robotic manipulation. To address this gap, we leverage manipulation priors learned from large-scale egocentric video datasets to improve policy learning for dexterous robotic manipulation tasks. We present MAPLE, a novel method for dexterous robotic manipulation that learns features to predict object contact points and detailed hand poses at the moment of contact from egocentric images. We then use the learned features to train policies for downstream manipulation tasks. Experimental results demonstrate the effectiveness of MAPLE across 4 existing simulation benchmarks, as well as a newly designed set of 4 challenging simulation tasks requiring fine-grained object control and complex dexterous skills. The benefits of MAPLE are further highlighted in real-world experiments using a 17 DoF dexterous robotic hand, whereas the simultaneous evaluation across both simulation and real-world experiments has remained underexplored in prior work. We additionally showcase the efficacy of our model on an egocentric contact point prediction task, validating its usefulness beyond dexterous manipulation policy learning.

  Country:
  Industry: Health & Medicine (0.46)

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

Zhu, Minghan, Wang, Zhiyi, Sun, Qihang, Ghaffari, Maani, Posa, Michael

arXiv.org Artificial Intelligence

Abstract-- Object geometry is key information for robot manipulation. Y et, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occlusion occurs. In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals. First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry. Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry. We combine the two sources of information through contact-guided 3D generation. The guidance formulation is inspired by drag-based editing in generative models. Experiments on synthetic and real-world data show that our approach improves the reconstruction compared to pure 3D generation and contact-based optimization.


OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance

Zhang, Lei, Zheng, Diwen, Bai, Kaixin, Bing, Zhenshan, Marton, Zoltan-Csaba, Chen, Zhaopeng, Knoll, Alois Christian, Zhang, Jianwei

arXiv.org Artificial Intelligence

Dexterous grasp generation aims to produce grasp poses that align with task requirements and human interpretable grasp semantics. However, achieving semantically controllable dexterous grasp synthesis remains highly challenging due to the lack of unified modeling of multiple semantic dimensions, including grasp taxonomy, contact semantics, and functional affordance. To address these limitations, we present OmniDexVLG, a multimodal, semantics aware grasp generation framework capable of producing structurally diverse and semantically coherent dexterous grasps under joint language and visual guidance. Our approach begins with OmniDexDataGen, a semantic rich dexterous grasp dataset generation pipeline that integrates grasp taxonomy guided configuration sampling, functional affordance contact point sampling, taxonomy aware differential force closure grasp sampling, and physics based optimization and validation, enabling systematic coverage of diverse grasp types. We further introduce OmniDexReasoner, a multimodal grasp type semantic reasoning module that leverages multi agent collaboration, retrieval augmented generation, and chain of thought reasoning to infer grasp related semantics and generate high quality annotations that align language instructions with task specific grasp intent. Building upon these components, we develop a unified Vision Language Grasping generation model that explicitly incorporates grasp taxonomy, contact structure, and functional affordance semantics, enabling fine grained control over grasp synthesis from natural language instructions. Extensive experiments in simulation and real world object grasping and ablation studies demonstrate that our method substantially outperforms state of the art approaches in terms of grasp diversity, contact semantic diversity, functional affordance diversity, and semantic consistency.


Simulation of Active Soft Nets for Capture of Space Debris

Costi, Leone, Izzo, Dario

arXiv.org Artificial Intelligence

In this work, we propose a simulator, based on the open-source physics engine MuJoCo, for the design and control of soft robotic nets for the autonomous removal of space debris. The proposed simulator includes net dynamics, contact between the net and the debris, self-contact of the net, orbital mechanics, and a controller that can actuate thrusters on the four satellites at the corners of the net. It showcases the case of capturing Envisat, a large ESA satellite that remains in orbit as space debris following the end of its mission. This work investigates different mechanical models, which can be used to simulate the net dynamics, simulating various degrees of compliance, and different control strategies to achieve the capture of the debris, depending on the relative position of the net and the target. Unlike previous works on this topic, we do not assume that the net has been previously ballistically thrown toward the target, and we start from a relatively static configuration. The results show that a more compliant net achieves higher performance when attempting the capture of Envisat. Moreover, when paired with a sliding mode controller, soft nets are able to achieve successful capture in 100% of the tested cases, whilst also showcasing a higher effective area at contact and a higher number of contact points between net and Envisat.


Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment

Bai, Wenbin, Chen, Qiyu, Lin, Xiangbo, Li, Jianwen, Li, Quancheng, Pan, Hejiang, Sun, Yi

arXiv.org Artificial Intelligence

The inherent difficulty and limited scalability of collecting manipulation data using multi-fingered robot hand hardware platforms have resulted in severe data scarcity, impeding research on data-driven dexterous manipulation policy learning. To address this challenge, we present a hand-agnostic manipulation transfer system. It efficiently converts human hand manipulation sequences from demonstration videos into high-quality dexterous manipulation trajectories without requirements of massive training data. To tackle the multi-dimensional disparities between human hands and dexterous hands, as well as the challenges posed by high-degree-of-freedom coordinated control of dexterous hands, we design a progressive transfer framework: first, we establish primary control signals for dexterous hands based on kinematic matching; subsequently, we train residual policies with action space rescaling and thumb-guided initialization to dynamically optimize contact interactions under unified rewards; finally, we compute wrist control trajectories with the objective of preserving operational semantics. Using only human hand manipulation videos, our system automatically configures system parameters for different tasks, balancing kinematic matching and dynamic optimization across dexterous hands, object categories, and tasks. Extensive experimental results demonstrate that our framework can automatically generate smooth and semantically correct dexterous hand manipulation that faithfully reproduces human intentions, achieving high efficiency and strong generalizability with an average transfer success rate of 73%, providing an easily implementable and scalable method for collecting robot dexterous manipulation data.


Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation

Shaoul, Yorai, Chen, Zhe, Mohamed, Mohamed Naveed Gul, Pecora, Federico, Likhachev, Maxim, Li, Jiaoyang

arXiv.org Artificial Intelligence

Coordinating a team of robots to reposition multiple objects in cluttered environments requires reasoning jointly about where robots should establish contact, how to manipulate objects once contact is made, and how to navigate safely and efficiently at scale. Prior approaches typically fall into two extremes -- either learning the entire task or relying on privileged information and hand-designed planners -- both of which struggle to handle diverse objects in long-horizon tasks. To address these challenges, we present a unified framework for collaborative multi-robot, multi-object non-prehensile manipulation that integrates flow-matching co-generation with anonymous multi-robot motion planning. Within this framework, a generative model co-generates contact formations and manipulation trajectories from visual observations, while a novel motion planner conveys robots at scale. Crucially, the same planner also supports coordination at the object level, assigning manipulated objects to larger target structures and thereby unifying robot- and object-level reasoning within a single algorithmic framework. Experiments in challenging simulated environments demonstrate that our approach outperforms baselines in both motion planning and manipulation tasks, highlighting the benefits of generative co-design and integrated planning for scaling collaborative manipulation to complex multi-agent, multi-object settings. Visit gco-paper.github.io for code and demonstrations.



Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields

Yin, Zhao-Heng, Abbeel, Pieter

arXiv.org Artificial Intelligence

Despite years of research, real-time diverse grasp synthesis for dexterous hands remains an unsolved core challenge in robotics and computer graphics. We present Lightning Grasp, a novel high-performance procedural grasp synthesis algorithm that achieves orders-of-magnitude speedups over state-of-the-art approaches, while enabling unsupervised grasp generation for irregular, tool-like objects. The method avoids many limitations of prior approaches, such as the need for carefully tuned energy functions and sensitive initialization. This breakthrough is driven by a key insight: decoupling complex geometric computation from the search process via a simple, efficient data structure - the Contact Field. This abstraction collapses the problem complexity, enabling a procedural search at unprecedented speeds. We open-source our system to propel further innovation in robotic manipulation.


TAPOM: Task-Space Topology-Guided Motion Planning for Manipulating Elongated Object in Cluttered Environments

Li, Zihao, Zhu, Yiming, Zhong, Zhe, Ren, Qinyuan, Huang, Yijiang

arXiv.org Artificial Intelligence

To explore topologically complex free spaces and identify critical pathways, task-space topology analysis is employed to explicitly model free space connectivity and find critical regions. Due to the sampling inefficiency encountered when planning through narrow passages in high-dimensional C-space, a keyframe-guided sampling-based planner is developed that leverages topological insights from high-level analysis to explore C-space. Experimental validation is conducted demonstrating the effectiveness and efficiency of proposed method compared to state-of-the-art planning baselines on manipulation tasks involving elongated objects and narrow passages. Remainder of the article is organized as follows. Section II formally defines the planning problem. Section III details the proposed topology-aware high-level planning approach. In Section IV, the method for low-level path generation is presented. Section V describes experimental setup and results used to evaluate the performance of proposed method. Finally, Section VI provides a brief summary of the work and discusses directions for future research.


ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing

Steinkrüger, Noah, Nilavadi, Nisarga, Burgard, Wolfram, Kaiser, Tanja Katharina

arXiv.org Artificial Intelligence

Abstract-- Object transportation in cluttered environments is a fundamental task in various domains, including domestic service and warehouse logistics. In cooperative object transport, multiple robots must coordinate to move objects that are too large for a single robot. One transport strategy is pushing, which only requires simple robots. However, careful selection of robot-object contact points is necessary to push the object along a preplanned path. Although this selection can be solved analytically, the solution space grows combinatorially with the number of robots and object size, limiting scalability. Inspired by how humans rely on common-sense reasoning for cooperative transport, we propose combining the reasoning capabilities of Large Language Models with local search to select suitable contact points. Our LLM-guided local search method for con tact po int se lection, ConPoSe, successfully selects contact points for a variety of shapes, including cuboids, cylinders, and T -shapes. We demonstrate that ConPoSe scales better with the number of robots and object size than the analytical approach, and also outperforms pure LLM-based selection. I. INTRODUCTION Object transportation is a fundamental task in many robotic applications.